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. , ' Statistical Analyses of the Data from the First Year of 
Use of the, Student Ratings Forms* of the University 
of Washington Instructional Assessment System 

Gerald M. Gillmlare % / 

' 0' . v ;; 

This report presents statistical Analyses of data derived from the first . 
year 1 8 use of the Instructional Assessment System. Included are: itieans , 
standard deviations,. and several reliability estimates foj: eath item within 
each form; inter-item correlations; and correlations of itextis with 
non-evaluative variables. Among the major results discussed are the .nigh, 
item reliabilities -for all but small classes, the high inter-iteto co£*e- 
fcations and their implications for use of ratings results for diagnosis 
of instructional problems,. ac#the causal implications of r item correlations 
with non-eyaluative variables, e.g. . whether students wanted to take the 
course and-^rade expected. j j i 
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Statistical Analyses of the Data from the Fifcst Year of Use of the 
Student Ratings- Forms of the University of Washington f 
Instructional Aisp8,sment System\ 




' G.. M., Gillmore 

The systematic cfollection and dissejiination of evaluative information 
from students concerning the courses xa which they kve enrolled is a 
connon occurrence in American Higher Education. The .University of Washing- 
ton (UW) as atf inertitutiop, is a pioneer in this endeavor, with efforts* 

^dating back to' the 1920's. During this period of time, data have been 
collected from students 'via a yariety of means. For example, in 1951 a 

- procedure Was implemented in which students at registration ranked their 
instructors of the_ previous % term on the basia of. teaching iter it. 1 More 
common over this time, and still prevalent on college campuses „ is the * 
single ^orm containing a" series of evaluative statements about a course 
and ins true tor 7 which is administered at or near the end of a course. 
Students either indicate the extent of their agreement to each statement, 
in the classical Likert Strongly Agree to Strongly Disagree format, or 
rate the "goodness" of the- course in terms of each item. * 

The most recent 'major ciujnge in Ae Student Ratings program at<Jie 
UW accurred in the fall of 1974, when the' Instructional Assessment System* 
(IAS) was implemented. ^/^The purpose of this regprt is to presen/ analyses 
of the data collectedTthe first year of use. To achieve this purpose 
coherent manner, a sm^l amount of description is necessary, the key 
cepts guiding the development of the system can b£ briefly stated as 
follows: . " * 4 ' * ! • 

; :— 7 — 

( According to P. Hodgson (1974), "The faculty quickly labeled this 
procedure with tWe ungraceful name of Dragnet, the program fell from 
favor, and after a few years .was discarded by the administration". ( p . 5). 

• 2 

Shelley Tucker, Helen Sjgith, and John HcMillin, along with the author, 
were responsible for the forms an'd.the items within forms of the. system. 
Jerry Edwards designed the optically scannable .ifiput documents, and he and . 
Ronald Stofer designed^ and wrote the computer analysis system. The entire: 
developmental project was> wholly funded by the University, of Washington / 
Educational Assessment Center* * >l 
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* Fir§t, there is an e^licit recognition the student ratings 

can and (Jo serve multiple functions, and the same evaluative 
questions are hot necessarily appropriate for each. Secondly, 
there is an explicit recognition that adequate diagnostic informa- 
tion cannot be efficiently provided instructor^ with use of .a . » . 
common set of evaluative questions for all classes (Gillmore, 
1974^?.,!). . ' • 

The impetus for the IAS came in part from the delineation of types of 

information yielded l?y student ratings found in Smoek and Crooks (1974). 

Also influential in a negative .regard was my experience with use of the 

same evaluative items for the diversity of. clashes foun'd within a Isrtrge . 

university. ^ > 

The system contains five forms ,\ each tailored for a broad course 
» -\ • 

type* Form A was designed for small ffecture-discussidn clashes; form B 4 

for large lectures ; Form C for seminar-discussion classes; Form D for 
problemrsolving classes, and Form E ,fo* skill-acquisition classes. f Each 
form has itenp within three sections, each with distiuct instructions to 
Students. Section 1 contains 4 global evaluative items whose major pur- 
pose is normative, i.e., to allow comparisons with various populations. 
^Tlifer^items are common to every fomu Section 2 contains 11 items that 
are diagnostic in nature. They are designed to provide feedback to in*- 
stjuctors, useful for improving the course.- These items are unique to 
each form, although overlap is present. Section 3' contains 7 items ^fchich 
are published for student use, with instructor permission. These items 
arecommon to all forms. Thus, it is only Section 2, the diagnositc. 
items^ which change from form to form. The items for all forms are 
listed in > Tabl^s i through 7 (pages 6 through 11>. 

All items use a six position response scale. , The response position 
labels aijd their numerical Values are as follows: Excellent (5), Very 
Good (4), Good (3), Fair (2)„ Poor (1), and Very Poor (0). 

Additional information aslced of students vjien they complete the 
form is as follows: * w ' * ~* — ^ 

, : _ - ^ : 

To be more precise, the forms contain two additional sections: space 
fo-r 8 optional, instructor-chosen closed items and two open-ended' ques- 
tions. Neither of these sections is relevant td the present discussion. < 



^When registering, was this a course^you wanted to take? Yes, Neutral, 
< - No. *• , 

h Is this course: In your major, In your Minor or program requirement, 
A distribution requirement, An eleqtive, Other. 

. Your class* Freshman, Sophomore, Junior, Senior, Graduate, Oth'e£. 
• Grade you expect to receive; A/ B, C, D, E, Pass. 

The results to be presented itx this report include the ^number of 
classes in which each form was used at the UW and the average class sizes,, 
the mean, standard deviatd^otfT^nd reliability of each item, the inter-item 
correlations of the eleven common items (Section^ 1 and 3) and correlations 
of the common item^ with selected non-evaluative variables such as class 
size and level. * 

3 /* > 

Data -Source / ( * 

All data presented in this jreport are derived from courses ' 

at UW in which the IAS was used. In most cases this is an exhaustive 

s^sspple of those using the system for the academic year 1974t-75, excluding 
summer quarter. + Whefe the data presented are not ati exhaustive sample, 
it will \>e so indicated. It is best to consider the data as^rcoming from 
a non-r k andpm volunteer set of courses since, strictly speaking, student 
evaluatiorfs^of courses using IAS are not mandatory. However, faculty are 

" required by Faculty Senate regulations , to provide some evidence of teaching 
effectiveness as perceived by students in any request for promotion or 
merit pay increase. Most faculty fulfill this obligation by using a form, 
of the IAS for some or air of their courses. Unquestionably, some academic 
departments place more pressure on faculty to use the IAS than others. 

, About all that can be said for certain is that , the sample ±s -neither exljaus- 
tive of nor a random sample from all classes taught. ^ 

1 - Results , % f 

The number of classes in which each form was used is presented in 
Table 1. Al3o in Table 1 is the average dlass,size for each. Class size 
is defined as the number of -forms complete^ within a given class, in this 
case and throughout this report. These values are ^almost sure to be some- «. 
what less on an average than the total enrollment due to absentees at the 



time of administration. However, they may be more representative of 

itual attendance, which in turn may be more important than enrollment Jin 
rms of relationships presented. 

Table 1 , 

Number and Average Size of Classes in which Each Form Was Use"d 

S~ Form ' • « Number of Average '' , f 

classes class- size 

" 1 ' - ' , A_ •' ' <* 1705* . 19/h' 

B . '826 - 46.351 

C 667 
D 558 7 

f ^ E * 607 * 14*10 



^-H,A8 
18J22- 



mos 



~^Total , 4373 ' 22.76 

• , V*' 

k ^ 

• «fc . . / ; 

"Form A,^whlch was designed to be the* most general, was clearly 

t pogular choice, being selected for* 39%' of the classes. The remaining 

forms were roughly equivalent in usage. The form designed fgr-'use in 

' large lecture- classes (B) was used by larger classes x>n Tfte average/ as 

expected. Both forms C and E tended to be used in fairly rimall classes, 

which agairi was as expected. / '» ' ' 

m . In Tables 2 thro/gh 7, means, standard deviations, and 3 reliability 

estimates are presented for items within each of the forms and for the 

common items across all £orms.' We will discuss the means and standard 

deviations; f£rst> and then turn to the reliability estimates. ^ 

\ Means -and standardVdeviatlons . ' Means and standard deviations we^re 

calculated for each Item using the following numerical codes: Excellent. ■ 

5, Very GoOjj - 4,' Good = 3, Fair * 2, Poor » 1, and Vety Poor - 0. Thus the 

most favorable possible mean value is 5.0, and the least ls^O.O. Note that* 

the unit of analysis Is classes and tjius the basic datum entered into 

* these particular calculations Is class means for given item6. Hence, the 

means presented are in reality mean* of class nfeans and the standard * 

deviations are actually standard deviations of class means^ They are^ 

. " ' • •) ) i 
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presented here mostly for their usefulness as £ reference for specific 
questions which might . arise * in the reader 1 s f mind, However, some general 
statements can be mad*, ' 0 - f * 

If one compares tfne four general items acrosS forms, users pf Form E 
received. the highest average rating, Torm C was next most favorably rated, 
followed by A, B, and D. One way analyses of 'variance show these differ- 
ences to be^ highly statistically significant. However, this significance 
must be interpreted within the context of large numbers of classes entering 
into the analysis. ^In fact, the specific form used only accounts for 
about' five percent of the total variance. Thus, although there are average 
differences, there is also a great deal of overlap/ 

In designing the system, we felt that one of the outcomes of 
distinguishing items for normative purposes from those for diagnostic pur- 
poses would be that the latter would elicit more critical information from 
students. If this were the c'ase, oie would expect means for the diagnostic 
item^ to tend to be^ lower jthan the geheral items. The content of the items 
is, of course, different, and thus, strictly speaking direct comparisons 
cannot 'be made. However, the item means within Section 2 are close in 
value, to those in Section 1 for all forms but E, where they tend to be 
somewhat smaller. 

In Sectipn 3, which like Section 1 has common items across all forms, . 
Form E users were rated highest on the average on every item. This is an 
interesting result considering the range of .aspects of a course covered by 
these items. < 

Over all forms, item 20 received the lowest average rating (Evaluative 
and grading techniques) whi^e item 3 received the highest average (The 
instructor's contribution to the course). The highest average rating given 
any item was 4.20 for item 11 on Form E (Student confidence in instructor's 
knowledge). The lowest average rating (3.15) was given to item 14 on Form 
B (Interest l&vel of class sessions). - # , 

• the standard deviations' were quite consistent for items both across 
forms and within forms. The range was from .45 for itety 2 on Form E to .70 
for item 15 on Form C. Most, however, fall within a tenth of a scale point 

pf each other. # 
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Reliability . The coefficients of reliability whictv-are presented can 
be interpreted as indices of inter-rater agreement, with the raters in this 
case being the students within a class* Perfect reliability from this view- 
point would be achieved when all students within each clas§ gave th£ same 
rating to that class, and there were differences between classes. Zero 
reliability, on the other hand, would indicate that the rating given by a 
student would not depend upon the class he was in* 

The presence of reliable ratingb—is essential to a successful system, 
for without reliability there* can be no validity. If the students who are 
enrolled cannot demonstrate any consistency in how they rate the course, 
then the ^resulting course mean ratings can have no meaning. 

The* reliability coefficients co be presented are intraclass 
correlations^ (Ebel, 1951), and coefficients for a single rater were com- 
puted using the following formula: * * 



MS_ - MS I7 
B W 



MSg + (k-l)MS w - • . 

4 

where MS-, is th£ mean square between classes , ^ 

HS^ is the mean square within classes 

k is the average class size. 

As the number of students who rate a class increases, the reliability 
of the resulting class means also increases as a function of the Spearman- 
Brown formula: 

k < r i> \ 

. K (k-1)^ + 1 --\ 

where r^ is the reliability of a class with k students and 

r^ is the reliability of a single rater; 

Reliabilities are presented for class sizes of one student, ten students 
and 40 students. The reader can start with the value for one "rater and 



4 

The reliability coefficients can ilso be viewed as generalizability 
estimates with items considered finite a*nd raters infinite (Kane et al., 
1974). 
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use the Spearman-3rown formula above to compute it for anjr size class. 1 
have chosen to present these values for a class of size 10, to represent a 
typical small class, and for size 40, to represent a typical large class. 
The reader is reminded thdt our unit of analysis is jthe class, whi^h. 
is in reality a confounding o§ an instructor with a course. There is no 
simple way to separate 'these two'. The 'reliabilities to be presented are 
thos^of classep, and as such give us information about the dependability 
of class ratings. They do not give us .information. about the dependability 
of ratings^of courses, e.g., Economics 201,* or instructors, e.g., Professor 
Doe. * ' 

The reliabilities for single raters vary from .15 for item 19 on Form 
E to .34 for item 3 on Form C. For* classes of size 10, the reliabilities 
range from .64 to .34, and for classes of 40 students, all reliabilities 
are .38 and above. These data would seem to indicate that items of the 
IAS are of adequate reliability for all but the very smallest of classes. 

In ||P eral > the reliability estimates for Form C are somewhat higher 
than thos,e for the oth^ forms. The reliabilities of items within Section > 
1 are also somewhat higher than of those within Sections 2 and 3. General 
instructor effectiveness and course contribution seem to be the concepts 
which are mo^t* reliably xa£&^. 

Inter-item correlations . Classes with fewer than six respondents 
have been eliminated from this analysis. As seen above, smaller classes 
have less reliability. HAce, the instability of the class means in small 
classes might unduly influence the magn/tud^pf the correlation coefficients. 

Inter-item correlations for the 22 items could be presented for each 
of the forms. This yields, however, 11 55 correlation coefficients which 
is far more than any rational human being wants to know, not to mention 
the awesome burden that places on a typist. We opted rather to present the 
correlations among the '11 common items, across the 5 forms only, which 
reduces the number of correlations to a more modest 55. Little information 
is lost as a result of the selection because items within forms tend to be 
highly correlated, and there does not seem to be a great deal of change in 
correlational patterns ftfom one "form to another.' 

To illustrate the high inter-item correlations, the average off-diagonal 
correlations between the 11 items of Section 2 for the five forms "is as 
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\ \ 

, r 

follows: Form A * .73; Form B = .77; Form C ■ Form D = .72; Form E « 

* \ 

• 66 • 

I, 

The correlations among the 11 general items*, „ 4 from Section 1 and 7 
from Sect±6n 3 across all forms are found in Tab^e 9. One can immediately 
note that the values tend to be fairly high, ratlgipg fr^n .54 (items 20 and 
21) to ,95 (items 3 and 4). Items 3 and 4 are tfce general instructor items 
and seem to be eliciting highly similar ratings .from students. If .the 
rating of the "Course as a whole 11 (item 1) is viewed as th^ most general •o^^y 
items, it is interesting to note whtc]i items cojjjrelate most highly with it: 
The content of the course, amount learned in tfi& course, and the two 
general instructor items, in that order. 

A close look at the inter- item correlatic^V reveals sojfie clustering. 

Items 18, and 19 correlate with each oth^k a bit higher than they 

correlate with the other items. Items 3, 4, a^H 17 also seem to go to- 
ll;: 

gether. The former items deal with the coursel&nd its content, the latter 

i" 

the instructor. Finally there is a slight tenancy for items 20, 21+ and 

22 to form a cluster. These items have to do ftith grading, assigned work, 

and student responsibilities. 

The presence of these clusters seems satisfactory intuitively. However, 

one should not overlook the magnitudes of all b£ the inter-item correlations, 

which are consistently high. In othgr words, the three clusters which have 

5 h 

been identified are highly correlated with each other. 

Correlations with Non-evaluative Variables ,! Previously, a list of 
variables was presented which represented the information solicited from 
students at the top of each form. To this list, we add some additional 
variables, which also are not directly evaluative, but could be of interest. 

CljpSs size . Actually this variable is the number of £orms which were 
filled wt within each class. This value is , not identical to class size, 
but surely highly correlated. In fact, one could make the case that it is 
actually more representative of the number of students who attend a given 

^One might wonder why factor analytic techniques were not applied to 
these data. It is my contention that the full correlation matrix is much 
more informative and less misleading than a factof loading matrix. In the 
latter case, there is a distinct leveling and sharpening effect, especially 
with use of orthogonal rotations, which tend to distort th£ importance of 
a general factor. 
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class, and thus of greatej^ relevance. We also entered the sq\xare root of 
the above value into the analysis, thinking that the student's psychology * 
leal perception of the bigness of a class might not actually b^ linear ' 
with the actual size, but rather increase a^ a function more closely 
representing the square root, e.g., 100 seems twice as large as 25 for 
instructional purposes, not fpur times as large. 

Publlcatitp questions . At the UW, ins^p*etors have a choice as to 
whether they want their results sent to their chairman, their dean, and 
^the results of items 16-22 published for student use. These decisions are 
made at the time the forms are administered to the class by the instructor 
and each decision i3 made independently of the other two. Bach of the 
three variables i was dichotomously scored, 0 if > not published, 1 if pub- 
lished. For the classes used in this study, the results of »63% of the 
surveys were sent to the chairman, 33% were sent to deans, and 53% were* 
published for student use. 

Course level . Finally, course l£vel was entered, as a variable. We 
used the first digit of the course nmdier fdV this vaLue*. At the DW, 100 



through 300 level courses are for undergraduates, with a few graduate stu- 



h a few i 



% 



dents enrolled in 300 level courses.^ Four-hundred and 500 level courses are 
for graduate students, with some* advanced undergraduates enrolling in the 
former. Evan though students are asked if the course is within their major, 
etc., at the top of each form, these data were not entered into the analysis 
due to difficulty in scaling the item.** Thus, we are, left with 9 variables, 
the list of which, along with how each was coded for analysis, Is presenteel 
in Table 9. The correlations of these variables wifetj| the 11 common 

6 * 

The reader should again recall that the unit of analysis is the course. 
Thus, a si;*gl2 value for each variable Is entered into the correlations "for . 
each course. What this value could be for the major-elective variable is v 
not clear. Arguments could probably be made for using the modal response,^ 
i.e.,, let's consider thp course in terms of what the greatest number of 
students are taking it for. fiowever, this ignores all students who are 
enrolled for a "non-modal" reason,. Also, we* could opt to def ine^flf^snr^ 
tinuous typfc variable which we could define as requiredness, oy something 
like that. However, any such construction placed fcpon this variable would 
be potentially misleading. Thus, for the present study, I chose to ignore 
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evaluative items are presented in Table 10. Close examination of the 

correlations of these variables with the items of the five individual forms 

failed to yield much information beyond that shown by the correlations with 

the common items across all forms. As above, only *those classes in which 

six or more students responded to a given itfem are included, in, this' 

analysis. 0 
• ♦ \ 
From the correlations of the general^ evaluative items with the * 

selected non-evaluative variables, one can see that the highest values are 
with the variable ''When enrolling, was this a course you wanted to take? 11 
This is particularly imgressive given the fact the mean ;£or^ this variable 
was 2.73, thus revealing a high degree of positive response. (These data 
evidence that by and large students at the UW waiter to take the classes in 
which they are enrolled.) These, correlations are higher with the items 
relating to the course and its pontent than those relating to the instruc- 
tor and his decisions. * 

The average expected grade for a class is also positively correlated 
fwith all the common items. The highest correction is with n „£vafAiative 
and gjading techniques' 1 (r * .43), thus indicating that the higher the 
average expected grade, the better liked the grading technique. 

The .average class size and course leyel are generally correlated 
positively, but small with all items except for Use of class time, which 
is ^li»tly negative. The largest correlation for both variables is with 
"Relevance and usefulness of course content/* 'Apparently higher level 
courses tend to be considered more relevant by students. Genrally, the 
'correlations with "content* 1 items are greater than with "instructor 11 * 
items. Class size, on the other h&ndfc^ generally shows a small negative 
correlation, with the correlations yielded by the square root transforma- 
tion being slightly larger in magnitude. The strongest relationship is 
with the item^ "Instructor interest in whether students learned. 11 
"Evaluative and grading techniques" is only slightly smaller. 

Finally, each of the three publication questions is correlated 
positively with all items, although the magnitudes are consistently small, 
indicating that chairmen, deans ? 'tad students are not receiving a highly? 
biased set of evaluations. ^ * 
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Table 9 



^Ndh-evQluative Variables 



VariabJ 
Wanted/fco take-icobrse > 



Course level 

9 

Class 

1 

Expected grade 



i 



Response categories and code 

Yes = 3 
No • 1 
Neutral. ■ 2 

100 level - 1 

200 level « 2 - ' 

300 level =» 3 

400 level = 4 

500 level < 



Freshing*! 2 1 
Sophomor'e * 2 
Junior =» 3 
Senior » 4 
Graduate « 5 




Cla^s size 



L^e^ 

Square root class size 



Actual number of questionnaires 
Square root of above 



Chairman 



copy 



Yes » 1 
No or omit ■ 0 



Dean copy 



Yes - 1 
No or omit s 0 



StudLent report 



Yes - 1 
No or optkt^* 0 



!$ble iO • 
Correlations between CommoityXtems arid Selected Non-evaluative 
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. i 


Variables|$icross All 


Forms* 

• 




* 


■ 






. ■ \ 

Va£&ble 








Item 














1 


' 2 h{ 3 


*4 


16 


17 


' 18 


19 


20 


21 


22 


Wanted to t?ke course o 


.42 


.44,r .29 


.29 


.29 


.26 


.41 


.45 


.26 


.26 


.24 


Course level 


.09 


.ip .02 


.01 


-.05 


.05 


.07 


.18 


.06 


-.01 


-.03 


Class 


.10 


-03 


.03 


.-.04 


.07 


.07 


.17 


.07 


.01 


-.02^ 


Expected grade 


.34 




.28 


*.17 


.34 


.31 


.•32 


.43 


.35 


.31 


Class size 


-.09 


- jo$ -.09 


-.07 


< 

-.03 - 


.17 


-:n 


'-.11 


-.16 


-.11 


-.06 


Square root class size 


-.15 


-JJ.3 -.14 


-.12 


-.07- 


.23 


-.16 


-.17* 


-.21 


-.15 


-.09 


Chairmap^opy 


.05 


; .05 .06 


.06 • 


.06 


.06 


.07 


.06 


.06 


.03 


.05 ' 


Dean copy { 


• U 


.11*, «-10 


.11, 


.08 


.08 


:n 


.12 


.09 


.07 


.07 . 


^Student report 


.09 


1 .09 .11 


.12 


.11 


.10 


.li 


.08 


.10 


.10 


.12 



*See Tatle 7 or 8 for itelrfcordings. 
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Discussion 

There are many potential areas of discussion embedded in the analyses 
which have been presented. I have chosen to focus on three such areas, 
which are: the reliability of items, the extent to which items are diag- 
nostic, and biases or factors outside the course and instructor which 
might affect ratings. 

The reliability of items . As mentioned earlier, 'inter-rater 
reliability is a very necessary attribute of a successful instrument. 
Indeed, individual items should be evaluated in this regard and sub-standard 
items discarded. Reliability is not sufficient, of course, since one can 
reliably measure something which has no relationship to the purpose of the 
measurements. » \ 

Based on the data presented, two assertions seem warranted. First, 
there are no ''bad" items in this regard, i.e., evefcy item seems to have 
adequate reliability. Even the least reliable item reaches \64 with only 
10 raters and .78 with 20 raters. Secondly, as results from smaller and 
smaller classes are considered, more items become of questionable relia- 
bility, and interpretation becomes more suspect. 

It is also worth mentioning the tendency for the items from. Form C to 
have higher reliabilities than those of the other forms, because this 
might help dispel a myth extant within higher education. The myth is^that 
seminar type classes are universally easy to teach and liked by students. * 
These data strongly suggest that students are able to consistently dis- 
criminate good seminars, from their point of view, from less good seminars. 

E^fc^y^tudents are able to better make this discrimination than they 
fan for otnpf tyjifes of courses. 
Are Itemg Diagnostic^ 

Section 2 items Vere designed to be diagnostic. By thisv I do not 
necessarily mean that these items can reveal specific instructional prob- 
lems, but that these items can reveal areas <^f problems which can then be 
looked at more closely. This corresponds to what Smock and Crooks (1974) 
called level II items. Thes<§ items can be contrasted with the general 
items (Section 1) whith are indicative of overall quality, but give no 
hint o*f where the problems may lie. 
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We hoped that the direction^ given to students for the items in / 
-Section 2 .would elicit more critical information than yielded by items in' 
Section 1. It seemed reasonable to suspect that one couid criticize' .sbme- 
pne more easily in o^der to help him improve than if it would be, informa- 
tion mostly useful in determining a person's promotion or termination. *As 
mentioned in the results section, the closeness in magnitude of item means 
between the sections indicates that this attempt was probably not 
successful. 

However, the items within Section 2 still could be achieving a 
diagnostic function. In such a case, we dould exptect an instructor to be 
rated favorably on some of the items in Section 2 and not so favorably on 
others. The mix would depend upon the extent of his instructional prob- * 
lems and the .specific items on a form. What we would not expect is all 
items rated at a level roughly equal to the rating of the general item 
for a given instructor. Equivalently, we would not expect high inter-item 

to* J 

correlations within this section. But, as reported earlier, we do get 
fairly high inter-item correlations. These correlations can be described 
as a "halo" efcfect, which is "...the tendency, in making an estimate or 
rating, of one character! sticN^f a person, to be influenced by another 
characteristic or, by one's general impression of that person 11 (English & 
English,^ 1958, p. 236). Insofar as there is a halo effect operating, then 
the items can not be diagnostic. * 

The dilig^s^ versus halo question cannot be fully resolved from the 
data at hand. Evjsn though the inter-item correlations are Mgt*T?there is 
still specific variance within each item, that is, variance attributable 
neither to that which is in common v^Lth other variables nor to that which 
is measurement error. Furthenyre, items with similar content correlate 
more highly than items with dissimilar content. To illustrate, on Form A 
item 7 (Explanati on s « Ay Ino truc tor were^knd item 8 (Instructor's abiA<^ 
to present alternative explanations t$ien needed) correlated at .92. But 
these two items/correlated with item 13 (Encouragement given students to 



express *K§mselves) at .69 and '.72 respectively. Thus, there is some dif- 
ferential responding by students. ^* 

Another point which can be made is that when tanking in terms of 
a halo effect, it is easy to lose sight of the possibility that one who 
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does well in hxs teaching in one area slso tends to do well in other areas, 
and vice "V&^fCj In other- words, the halo may be, in fact, an accurate 
perceptionTrai B^ c ^ oose t0 acce Pt this view, but also allow for some 
except ions^BBHBr^ve^ instructor may be strong in most areas , but weak 
;Ln a few, or^llc^versa, then the pattern of correlations obtained is just 
what we would >expect. 

It seems too early to give up the notion thSt student rating 
information can be diagnostic. On the other hand, there is reason for 
pessimism in this regard. Certainly diagnostic clues may be provided by 
open-ended comments from students. If these are followed by more 
detailed and precise closed-ended questions, more successful diagnosis 
may result. Further research is needed to determine the diagnostic value 
of student ratings items both within ^ction 2 and in general. 

Biases . One hope£ that ratings given a course aj^r^f lective of the 
content and teaching of that course, and not influenced greatly by 
non-instructional factors. We have isolated four variables which seem to 
relate to ratings to a non-trivial extent: whether, when registering, 
students wanted to take the class, expected grades, the evaluation form 
used,, and class size. These are ordered in terms of apparent importance. 
Showing a relationship is one thing, however, ^nd understanding it 
causally is quite another. 

TKfe^iestion, "When registering, was this a course you wanted^to- 
take?" seems to be the potentially most important. Not only does it 
account for more variance Jzhan any of the others (almost 20% for some 
items), it is also information which could be collected at the beginning 
of the course, and later ratings could then be appropriately adjusted. 
The causation can only go one way. The only reservation about this 
variable is the extent to which the reputation of the course or instruc- 
tor influences^ whether or not students wnat to take it or him. If the 
correlation were so explained, then clearly adjustments in end-of -course 
ratings are not appropriate. 

The relatioa between ratings and expected grade is an explosive 
issue. The argument goes that the way to get high ratings is to promise 
students high grades. Uowever, one could also argue that if |tudents 
like the course they will work harder and get better grades, or a well 
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taught cotirse will result in both more learning, and'hence higher grades, 
and high ratings. For any of the above, a positive correlation would <^ 
result. ' ■ * 

The fact that instructors using Form E tend to get somewhat higher 
ratings is equally fuzzy in interpretation* It is possible that there is 
something inherent in skill-acquisition type courses which students like 
better. Also, it is equally plausable that instructors ,teifd to-work a b'it 
harder to put this type of course together, or perhaps tend t;p be more 
student oriented, which is reflected , their willingness to teach this 
kin4 of course. If one chooses to make' any of these arguments, he must 
also be willing to make the same argument in a negative sense concerning 
problem solving courses, since they , come out at the bottom of the heap. 
One thing is clear. It is the course which is important, not the form, 
since it is the common items on wh£ch Form E users are higher, not the' 
diagnostic it*ems. Thus, the act of choosing Form E alone does not help 
anybody's ratings. * " " 

Finally, I mention class size. Jh actuality, it is not a major 
influence on ratings, possibly much less of an influence than people think. 
Perhaps the most intetesting aspect is that large classes do not 
automatically lead to low ratings. ') 
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